Minimum Spanning Trees Displaying Semantic Similarity
نویسندگان
چکیده
Similarity of semantic content of web pages is displayed using interactive graphs presenting fragments of minimum spanning trees. Homepages of people are analyzed, parsed into XML documents and visualized using TouchGraph LinkBrowser, displaying clusters of people that share common interest. The structure of these graphs is strongly affected by selection of information used to calculate similarity. Influence of simple selection and Latent Semantic Analysis (LSA) on structures of such graphs is analyzed. Homepages and lists of publications are converted to a word frequency vector, filtered, weighted and similarity matrix between normalized vectors is used to create separate minimum sub-trees showing clustering of people’s interest. Results show that in this application simple selection of important keywords is as good as LSA but with much lower algorithmic
منابع مشابه
Clustering Web Search Results with Maximum Spanning Trees
We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. We show that our approach improves c...
متن کاملA Novel Algorithm for Meta Similarity Clusters Using Minimum Spanning Tree
The minimum spanning tree clustering algorithm is capable of detecting clusters with irregular boundaries. In this paper we propose two minimum spanning trees based clustering algorithm. The first algorithm produces k clusters with center and guaranteed intra-cluster similarity. The second algorithm is proposed to create a dendrogram using the k clusters as objects with guaranteed inter-cluster...
متن کاملAggregating Entries of Semantic Valence Dictionary of Polish Verbs
In this paper the phase of semantic valence dictionary of Polish verbs consisting in aggregating entries to semantically coherent sets is presented. Two methods: a simple agglomerative one and minimal spanning trees method are discussed and compared. Both methods use a predefined similarity measure of semantic frames.
متن کاملSOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning
In this paper, we present a novel textual case-based reasoning system called SOPHIA-TCBR which provides a means of clustering semantically related textual cases where individual clusters are formed through the discovery of narrow themes which then act as attractors for related cases. During this process, SOPHIA-TCBR automatically discovers appropriate case and similarity knowledge. It then is a...
متن کاملSemantic Relation Extraction Using Penalty Tree Similarity
In the past decades, kernel methods are enthusiastically explored for relation extraction. This paper proposes a penalty tree similarity algorithm by extending the dependency tree kernel. Dependency tree kernel computes the similarity of two parse trees by enumerating their matched sub-trees. The penalty tree similarity, however, not only consider the similar structures of the parse trees, but ...
متن کامل